---
title:  How to Design Data
description: >
  Notes on the How to design data recipe.
categories: posts
tags: [systematic-design]
---

<details markdown="1" id="table-of-contents">
<summary>
Table of Contents
</summary>

* TOC
{:toc}
</details>

## The Inherent Structure of the Information

A data definition establishes the represent/interpret relationship between
information and data. Hence, to identify the inherent structure of the
information is extremely important.

The structure of the information determines the kind of data definition used.
The data definition determines the structure of the templates and helps
determine the function examples/tests. Which in turn structures much of the
final program design.

### Simple Atomic Data

Atomic data is information which, if disassemble, doesn't make sense. For
instance, the name of a city, if disassemble into characters it doesn't provide
new information about the city.

```racket
;; Time is Natural
;; interp. number of clock ticks since start of game

(define START-TIME 0)
(define OLD-TIME 1000)

;;; TEMPLATE
```

#### Guidance on test suite design

Two tests should suffice for simple atomic data. Additional tests are required
if there are multiple cases involved.
If the functions produces `boolean` you should have, at least, one test per
boolean.

### Intervals

Intervals are used to represent information within a certain range. They often
appear in itemizations, but can also appear alone.

```racket
;; Countdown is Integer[0, 10]
;; interp. the number of seconds remaining to liftoff
(define C1 10)  ; start
(define C2 5)   ; middle
(define C3 0)   ; end

;;; TEMPLATE
```

#### Guidance on test suite design

Provide sufficient examples to illustrate how the type represents information.
When writing tests for functions operating on intervals be sure to test closed
boundaries as well as midpoints. As always, be sure to include enough tests to
check all other points of variance in behavior across the interval.

### Enumerations

Enumerations are useful when the information to be represented consists of a
fixed number of distinct items, such as colors, letter grades etc. In the case
of enumerations it is sometimes redundant to provide an interpretation and
nearly always redundant to provide examples. The example below includes the
interpretation but not the examples.

```racket
;; LightState is one of:
;;  - "red"
;;  - "yellow"
;;  - "green"
;; interp. the color of a traffic light

;; <examples are redundant for enumerations>

;;; TEMPLATE
```

#### Guidance on test suite design

Functions operating on enumerations should have (at least) as many tests as
there are cases in the enumeration.
For big enumerations, though, it is not necessary to write out all the cases
for such a data definition. Instead write one or two, as well as a comment
saying what the others are, where they are defined etc.

Defer writing templates for such large enumerations until a template is needed
for a specific function. At that point include the specific cases that function
cares about. Be sure to include an else clause in the template to handle the
other cases. The same is true of tests. All the specially handled cases must be
tested, in addition one more test is required to check the else clause.

### Itemizations

An itemization describes data comprised of 2 or more subclasses, at least one of
which is not a distinct item.

```racket
;; Bird is one of:
;;  - false
;;  - Number
;; interp. false means no bird, number is x position of bird

(define B1 false)
(define B2 3)

;;; TEMPLATE
```

#### Guidance on test suite design

Itemizations should have enough data examples to clearly illustrate how the type
represents information. Functions operating on itemizations should have at least
as many tests as there are cases in the itemizations. If there are intervals in
the itemization, then there should be tests at all points of variance in the
interval. In the case of adjoining intervals it is critical to test the
boundaries.

#### Itemization of intervals

A common case is for the itemization to be comprised of 2 or more intervals. In
this case functions operating on the data definition will usually need to be
tested at all the boundaries of closed intervals and points between the
boundaries.

```racket
;;; Reading is one of:
;;  - Number[> 30]
;;  - Number(5, 30]
;;  - Number[0, 5]
;; interp. distance in centimeters from bumper to obstacle
;;    Number[> 30]    is considered "safe"
;;    Number(5, 30]   is considered "warning"
;;    Number[0, 5]    is considered "dangerous"
(define R1 40)
(define R2 .9)

;;; TEMPLATE
```
